Skip to content

fix: definition coverage gaps in words hub and 5 small languages#135

Merged
Hugo0 merged 1 commit intomainfrom
fix/definition-coverage-gaps
Feb 26, 2026
Merged

fix: definition coverage gaps in words hub and 5 small languages#135
Hugo0 merged 1 commit intomainfrom
fix/definition-coverage-gaps

Conversation

@Hugo0
Copy link
Owner

@Hugo0 Hugo0 commented Feb 26, 2026

Summary

  • Words hub + word pages (/<lang>/words, /<lang>/word/<id>) only read definitions from the disk cache, completely missing the 206K pre-built kaikki definitions that are already loaded in memory. This caused most words on the words hub to show no definition — e.g. Spanish showed definitions for only 3 out of 7,405 words.
  • 5 small/constructed languages (Palauan, Interlingue, Kinyarwanda, Klingon, Quenya) had zero definition coverage — they fall through all 4 tiers (no kaikki files, UNRELIABLE parser confidence, not in LLM allowlist).

Changes

  1. app.py: Both language_words_hub() and word_page() now fall back to lookup_kaikki_native()lookup_kaikki_english() on disk cache miss. These are in-memory dict lookups (zero latency, no network calls).
  2. wiktionary.py: Added pau, ie, rw, tlh, qya to LLM_LANG_NAMES so they can use the gpt-4o-mini fallback.

Test plan

  • All 2,250 tests pass, 198 parser tests pass
  • Visit /es/words — definitions should now appear for all words with kaikki entries
  • Visit /pau/words — LLM definitions should be fetchable via individual word pages
  • Visit /en/word/<id> — definition should server-render without needing JS fallback fetch

Summary by CodeRabbit

  • New Features
    • Enhanced definition retrieval across word hub and word pages with automatic fallback when cached definitions are unavailable.
    • Added support for 5 new languages: Palauan, Interlingue, Kinyarwanda, Klingon, and Quenya.

The words hub (/<lang>/words) and individual word pages (/<lang>/word/<id>)
only read from the disk cache for definitions, missing the 206K pre-built
kaikki definitions that exist in memory. This caused most words on the
words hub to show no definition (e.g. Spanish had 3/7405 cached).

Now both routes fall back to kaikki native → kaikki English lookups
on cache miss. These are in-memory dict lookups with zero latency.

Also adds 5 zero-coverage languages (pau, ie, rw, tlh, qya) to the
LLM fallback allowlist — they had no kaikki files, UNRELIABLE parser
confidence, AND were missing from the LLM allowlist, so every
definition request returned None.
@Hugo0
Copy link
Owner Author

Hugo0 commented Feb 26, 2026

@coderabbitai review

@coderabbitai
Copy link

coderabbitai bot commented Feb 26, 2026

📝 Walkthrough

Walkthrough

The changes implement a disk-cache-first fallback strategy for definition lookups across multiple functions in the application, importing new Wiktionary lookup functions for native and English language definitions. Additionally, five new language codes are added to the language names mapping to support Palauan, Interlingue, Kinyarwanda, Klingon, and Quenya.

Changes

Cohort / File(s) Summary
Definition Loading Logic
webapp/app.py
Imports lookup_kaikki_native and lookup_kaikki_english, then updates definition-loading fallback logic in language_words_hub (2 locations) and word_page to prioritize disk-cached definitions and fall back to native, then English kaikki lookups if needed.
Language Support Expansion
webapp/wiktionary.py
Adds five new language code mappings to LLM_LANG_NAMES: pau (Palauan), ie (Interlingue), rw (Kinyarwanda), tlh (Klingon), and qya (Quenya).

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 A rabbit hops through cached halls,
When definitions fail to call,
We seek the kaikki way—
Native first, then English's say!
Five new tongues now join the spree, 🌍
From Klingon to Palauan, we're free!

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically summarizes the main changes: fixing definition coverage gaps by adding fallback logic to the words hub and expanding language support to 5 small languages.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix/definition-coverage-gaps

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai
Copy link

coderabbitai bot commented Feb 26, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
webapp/app.py (1)

1418-1431: ⚠️ Potential issue | 🟡 Minor

Inconsistent handling of negative cache entries compared to language_words_hub.

In language_words_hub (line 1174-1175), the code checks loaded and loaded.get("definition") before assigning, which correctly skips negative cache entries like {"not_found": True, "ts": ...}.

Here in word_page, the assignment definition = loaded if loaded else None would treat a negative cache entry as a valid definition (since loaded is truthy), causing the kaikki fallback to be skipped.

Proposed fix to align with language_words_hub logic
     if os.path.exists(cache_path):
         try:
             with open(cache_path, "r") as f:
                 loaded = json.load(f)
-                definition = loaded if loaded else None
+                if loaded and loaded.get("definition"):
+                    definition = loaded
         except Exception:
             pass
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@webapp/app.py` around lines 1418 - 1431, The code in word_page incorrectly
treats any loaded cache JSON as a valid definition, causing negative cache
entries to short-circuit kaikki lookups; update the cache-load branch in
word_page to mirror language_words_hub by assigning definition only when loaded
and loaded.get("definition") is truthy (i.e., change the logic that sets
definition = loaded if loaded else None to require loaded.get("definition")), so
negative entries like {"not_found": true, "ts": ...} are skipped and
lookup_kaikki_native / lookup_kaikki_english still run.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@webapp/app.py`:
- Around line 1418-1431: The code in word_page incorrectly treats any loaded
cache JSON as a valid definition, causing negative cache entries to
short-circuit kaikki lookups; update the cache-load branch in word_page to
mirror language_words_hub by assigning definition only when loaded and
loaded.get("definition") is truthy (i.e., change the logic that sets definition
= loaded if loaded else None to require loaded.get("definition")), so negative
entries like {"not_found": true, "ts": ...} are skipped and lookup_kaikki_native
/ lookup_kaikki_english still run.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 99ea41c and 8ee5697.

📒 Files selected for processing (2)
  • webapp/app.py
  • webapp/wiktionary.py

@coderabbitai
Copy link

coderabbitai bot commented Feb 26, 2026

📝 Walkthrough

Walkthrough

The pull request enhances definition retrieval by implementing a two-step fallback mechanism for missing cached definitions, attempting native language lookups first before falling back to English equivalents. Additionally, five new languages (Palauan, Interlingue, Kinyarwanda, Klingon, Quenya) are registered in the language-name mapping system.

Changes

Cohort / File(s) Summary
Definition Fallback Logic
webapp/app.py
Imported lookup_kaikki_native and lookup_kaikki_english functions; integrated two-step fallback mechanism in word page and language hub routes to retrieve definitions from native language lookups first, then English lookups if cache misses.
Language Registry Expansion
webapp/wiktionary.py
Added five new language entries to LLM_LANG_NAMES constant: pau (Palauan), ie (Interlingue), rw (Kinyarwanda), tlh (Klingon), qya (Quenya).

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Poem

🐰 Hop, hop, the definitions flow!
From cache to Kaikki, off they go,
Native first, then English too,
And five new tongues we've added new! 🌍✨

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately and concisely describes the main changes: fixing definition coverage gaps by adding fallback lookups for the words hub and adding support for five small languages.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix/definition-coverage-gaps

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
webapp/app.py (1)

1418-1431: The fallback logic is correctly duplicated from language_words_hub.

The implementation is consistent with the hub page, which is good.

Consider extracting the shared definition-loading logic (disk cache + kaikki fallbacks) into a helper function to reduce duplication. This would consolidate the pattern used in both language_words_hub and word_page.

♻️ Optional: Extract shared definition loading to a helper
def _load_definition_with_fallback(word, lang_code):
    """Load definition from disk cache, falling back to kaikki pre-built dicts."""
    cache_path = os.path.join(WORD_DEFS_DIR, lang_code, f"{word.lower()}.json")
    if os.path.exists(cache_path):
        try:
            with open(cache_path, "r") as f:
                loaded = json.load(f)
                if loaded and loaded.get("definition"):
                    return loaded
        except Exception:
            pass
    definition = lookup_kaikki_native(word, lang_code)
    if not definition:
        definition = lookup_kaikki_english(word, lang_code)
    return definition

Then use definition = _load_definition_with_fallback(word, lang_code) in both routes.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@webapp/app.py` around lines 1418 - 1431, Extract the duplicated disk-cache +
kaikki fallback logic into a helper function (e.g.
_load_definition_with_fallback(word, lang_code)) that: checks the cache path
under WORD_DEFS_DIR/lang_code/word.lower() and attempts json.load inside a
try/except, returns the loaded dict if valid (or None on error), otherwise calls
lookup_kaikki_native(word, lang_code) and then lookup_kaikki_english(word,
lang_code) as fallbacks and returns the first non-None result; then replace the
duplicated blocks in word_page and language_words_hub with a single call to this
helper to obtain definition.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@webapp/app.py`:
- Around line 1418-1431: Extract the duplicated disk-cache + kaikki fallback
logic into a helper function (e.g. _load_definition_with_fallback(word,
lang_code)) that: checks the cache path under
WORD_DEFS_DIR/lang_code/word.lower() and attempts json.load inside a try/except,
returns the loaded dict if valid (or None on error), otherwise calls
lookup_kaikki_native(word, lang_code) and then lookup_kaikki_english(word,
lang_code) as fallbacks and returns the first non-None result; then replace the
duplicated blocks in word_page and language_words_hub with a single call to this
helper to obtain definition.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 99ea41c and 8ee5697.

📒 Files selected for processing (2)
  • webapp/app.py
  • webapp/wiktionary.py

@Hugo0 Hugo0 merged commit 3b3ef53 into main Feb 26, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant